Abstract: The advancement of technology with the Internet has generated plentiful of user-generated data. This content is used to give knowledgeable information using different data mining techniques. Among various types of generated data reviews about product, business or services are becoming more important. Now a days online review is often the primary factor and a valuable source of information in aiding customer’s purchase or service decisions. The vitality of the peer reviews has attracted spammers to induct fake and unrealistic reviews. Some online review systems are facilitating interactions between customers to improve its utility and experiences by expressing product or service opinions. Due to the large public opinion generated, directly or in-directly affecting the marketing of the products or service has incepted the manufacturer’s interest on online reviews. Observing the reliability of customers on reviews some vendors are trying to Fake It! thus misleading the customers. Despite aware of manipulated reviews, customer is unable to distinguish the fake once from the genuine review which necessitates building a system that filters reviews. In this paper, we approach a dual layer classification based on two -level filtering method. The intent is achieved by splitting into two levels, at first by using metadata followed by review content analysis In the first level of classification, we will consider the metadata parameters (IP address, time)to decide the truthfulness of the review. Next, Auto learning system is built which learns from past history of the user. The real reviews classified may still contain some suspicious reviews which calls second level of classification technique using review content features and reviewer centric features to detect review spam. In both the levels auto learning system is built which learns from past history of the user in the system which reduces the computational time when new data is fed into the system. A comparative study is carried out where our built model showed high performance than other techniques.
Keywords: primary factor and a valuable source, reliability of customers.